A new estimator of the discovery probability.
نویسندگان
چکیده
Species sampling problems have a long history in ecological and biological studies and a number of issues, including the evaluation of species richness, the design of sampling experiments, and the estimation of rare species variety, are to be addressed. Such inferential problems have recently emerged also in genomic applications, however, exhibiting some peculiar features that make them more challenging: specifically, one has to deal with very large populations (genomic libraries) containing a huge number of distinct species (genes) and only a small portion of the library has been sampled (sequenced). These aspects motivate the Bayesian nonparametric approach we undertake, since it allows to achieve the degree of flexibility typically needed in this framework. Based on an observed sample of size n, focus will be on prediction of a key aspect of the outcome from an additional sample of size m, namely, the so-called discovery probability. In particular, conditionally on an observed basic sample of size n, we derive a novel estimator of the probability of detecting, at the (n+m+1)th observation, species that have been observed with any given frequency in the enlarged sample of size n+m. Such an estimator admits a closed-form expression that can be exactly evaluated. The result we obtain allows us to quantify both the rate at which rare species are detected and the achieved sample coverage of abundant species, as m increases. Natural applications are represented by the estimation of the probability of discovering rare genes within genomic libraries and the results are illustrated by means of two expressed sequence tags datasets.
منابع مشابه
On Presentation a new Estimator for Estimating of Population Mean in the Presence of Measurement error and non-Response
Introduction According to the classic sampling theory, errors that are mainly considered in the estimations are sampling errors. However, most non-sampling errors are more effective than sampling errors in properties of estimators. This has been confirmed by researchers over the past two decades, especially in relation to non-response errors that are one of the most fundamental non-immolation...
متن کاملA New Ridge Estimator in Linear Measurement Error Model with Stochastic Linear Restrictions
In this paper, we propose a new ridge-type estimator called the new mixed ridge estimator (NMRE) by unifying the sample and prior information in linear measurement error model with additional stochastic linear restrictions. The new estimator is a generalization of the mixed estimator (ME) and ridge estimator (RE). The performances of this new estimator and mixed ridge estimator (MRE) against th...
متن کاملA New Estimator of Entropy
In this paper we propose an estimator of the entropy of a continuous random variable. The estimator is obtained by modifying the estimator proposed by Vasicek (1976). Consistency of estimator is proved, and comparisons are made with Vasicek’s estimator (1976), van Es’s estimator (1992), Ebrahimi et al.’s estimator (1994) and Correa’s estimator (1995). The results indicate that the proposed esti...
متن کاملThe Zografos–Balakrishnan-log-logistic Distribution
Tthe Zografos–Balakrishnan-log-logistic (ZBLL) distribution is a new distribution of three parameters that has been introduced by Ramos et el. [1], and They presented some properties of the new distribution such as its probability density function, The cumulative distribution function, The moment generating function, its hazard (failure) rate function, quantiles and moments, Rényi and Shannon ...
متن کاملA New Exponential Type Estimator for the Population Mean in Simple Random Sampling
In this paper, a new estimate of exponential type of auxiliary information to help simple random sampling without replacement of the finite population mean is introduced. This new estimator with a few other estimates using two real data sets are compared with the mean square error.
متن کاملAn Empirical Comparison of Performance of the Unified Approach to Linearization of Variance Estimation after Imputation with Some Other Methods
Imputation is one of the most common methods to reduce item non_response effects. Imputation results in a complete data set, and then it is possible to use naϊve estimators. After using most of common imputation methods, mean and total (imputation estimators) are still unbiased. However their variances (imputation variances) are underestimated by naϊve variance estimators. Sampling mechanism an...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Biometrics
دوره 68 4 شماره
صفحات -
تاریخ انتشار 2012